Data Replication Assignment
Introduction
The paper, Multi-scalar drivers of biodiversity: local management mediates wild bee community response to regional urbanization, by Ballare et al., that I am reanalysing data from collected bees from 40 sites. 20 of the sites are managed as agricultural sites, and 20 of them are maintained as grasslands. The study primarily wanted to see if there was an interaction between local and landscape level complexity by using data collected in a 50x50m plot (local) such as % bare ground and % vegetation, and data collected at a 2km range (landscape) such as Heterogeniety and % semi-nautral land cover. They found that bee abundance and community diversity in locally simple habitats, such as in agricultural sites, saw a positive response to landscape level habitat complexity. The data that I imported from this paper, included all of the bees that they collected (over 12,000) labeled by site and sample period collected, the site data which included the percent cover of bare ground, vegeation, semi-natural habitat, and amount of heterogenous patches around the site, etc., and the plant data, including counts of every species found at each site.
In the reanalysis I will be calculating the mean abundance at each site, as well as at only the agricultural or grassland sites. I then recreate two of the final linear mixed models from the paper that convey the primary drivers for species abundance and chao richness at the sites. Lastly, I recreate figure 3 from the paper using ggplot to plot various pollinator abundances and richnesses in relation to either % bare ground or plant richness.
#Loading in the datasets
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(ggplot2)
Bees<- read_csv("Bee Table Dryad_010119.csv")
## Parsed with column specification:
## cols(
## Site = col_character(),
## Site_City = col_character(),
## Site_County = col_character(),
## Trap_Type = col_character(),
## sample_start_date = col_character(),
## sample_end_date = col_character(),
## insect_family = col_character(),
## insect_tribe = col_character(),
## insect_genus = col_character(),
## insect_species = col_character(),
## insect_sex = col_character(),
## nest_location = col_character(),
## size = col_character()
## )
Plants<- read_csv("Floral Data Dryad.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## Site = col_character(),
## `Floral Sample Date` = col_character()
## )
## See spec(...) for full column specifications.
Sites<- read_csv("Site and Landscape Dryad.csv")
## Parsed with column specification:
## cols(
## Site = col_character(),
## Management = col_character(),
## `Veg Sample Date` = col_character(),
## `% Vegetation` = col_double(),
## `% Litter` = col_double(),
## `% Bare` = col_double(),
## `% Rocky/Other` = col_double(),
## `% Canopy Cover` = col_double(),
## `Heteogeniety (# Patches)` = col_double(),
## `% Developed` = col_double(),
## `% Crop` = col_double(),
## `% Semi-Natural` = col_double(),
## `% Other` = col_double()
## )
Sites_full<- read_csv("Site and Landscape Dryad_full.csv")
## Parsed with column specification:
## cols(
## Site = col_character(),
## Management = col_character(),
## `Veg Sample Date` = col_character(),
## `% Vegetation` = col_double(),
## `% Litter` = col_double(),
## `% Bare` = col_double(),
## `% Rocky/Other` = col_double(),
## `% Canopy Cover` = col_double(),
## `Heteogeniety (# Patches)` = col_double(),
## `% Developed` = col_double(),
## `% Crop` = col_double(),
## `% Semi-Natural` = col_double(),
## `% Other` = col_double()
## )
Visualization of Data
In the paper that I chose, they included some descriptive statistics such as the overall mean bee abundance found at each site, as well as the mean abundance found at the two management types, argicultural and grassland. These descriptive statistics are calculated and displayed in the chunk below.
#exploratory data analysis
head(Bees)
## # A tibble: 6 x 13
## Site Site_City Site_County Trap_Type sample_start_da⦠sample_end_date
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Bast⦠Bastrop Bastrop Pan Trap 6/14/13 6/15/13
## 2 Bast⦠Bastrop Bastrop Pan Trap 6/14/13 6/15/13
## 3 Bast⦠Bastrop Bastrop Pan Trap 5/22/13 5/23/13
## 4 Bast⦠Bastrop Bastrop Pan Trap 5/22/13 5/23/13
## 5 Bast⦠Bastrop Bastrop Pan Trap 5/22/13 5/23/13
## 6 Bast⦠Bastrop Bastrop Pan Trap 5/22/13 5/23/13
## # ⦠with 7 more variables: insect_family <chr>, insect_tribe <chr>,
## # insect_genus <chr>, insect_species <chr>, insect_sex <chr>,
## # nest_location <chr>, size <chr>
head(Plants)
## # A tibble: 6 x 231
## Site `Floral Sample ⦠ABES ACMI1 ACRO1 ALCA1 ALCE ALDR ALSC AMXX1 ANGR2
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Bast⦠5/22/13 0 0 0 0 0 0 0 0 0
## 2 Bast⦠6/14/13 0 0 0 0 0 0 0 0 0
## 3 Bast⦠7/2/13 0 0 0 0 0 0 0 0 0
## 4 Brac⦠5/27/13 0 0 0 0 0 0 0 0 0
## 5 Brac⦠6/18/13 0 0 0 0 0 0 0 0 0
## 6 Brac⦠7/5/13 0 0 0 0 0 0 0 0 0
## # ⦠with 220 more variables: ANMA3 <dbl>, ASTU <dbl>, ASVI2 <dbl>,
## # ASXXXX6 <dbl>, ASXXXX7 <dbl>, ASXXXX8 <dbl>, BIAM1 <dbl>, BOCO <dbl>,
## # CAAN4 <dbl>, CABE1 <dbl>, CAIN1 <dbl>, CAIN2 <dbl>, CAVI1 <dbl>,
## # CEAM1 <dbl>, CEME2 <dbl>, CHAS <dbl>, CHTA1 <dbl>, CHXX1 <dbl>,
## # CITE2 <dbl>, CIVI <dbl>, CNTE <dbl>, COBA2 <dbl>, COEQ <dbl>, COER <dbl>,
## # COGR10 <dbl>, COSA <dbl>, COXX1 <dbl>, CRMO6 <dbl>, CUME <dbl>, CUPE <dbl>,
## # CUSA4 <dbl>, DACA6 <dbl>, DACO2 <dbl>, DALA5 <dbl>, DECAV2 <dbl>,
## # DRAM <dbl>, ENPE4 <dbl>, ERCI6 <dbl>, ERMO2 <dbl>, ERST3 <dbl>,
## # ERVES <dbl>, ERXX8 <dbl>, ERXX9 <dbl>, EVSE <dbl>, FAXXXX1 <dbl>,
## # FAXXXX2 <dbl>, FAXXXX3 <dbl>, FOVU <dbl>, FRAN <dbl>, GACO5 <dbl>,
## # GAPA6 <dbl>, GAPU <dbl>, GASU <dbl>, GAXX1 <dbl>, GAXX2 <dbl>, GAXX3 <dbl>,
## # GETE2 <dbl>, GEXXXX1 <dbl>, GEXXXX2 <dbl>, GLBI2 <dbl>, HEAC <dbl>,
## # HEAM <dbl>, HEAMB2 <dbl>, HEAN3 <dbl>, HEDR <dbl>, HELA <dbl>, HETE3 <dbl>,
## # HYSC <dbl>, INMI <dbl>, IPTRT <dbl>, IPXX1 <dbl>, KRLA <dbl>, KRWR <dbl>,
## # LAHO <dbl>, LAIN <dbl>, LASA3 <dbl>, LAXX1 <dbl>, LERE4 <dbl>, LEXX1 <dbl>,
## # LITE3 <dbl>, MEMI <dbl>, MEOF <dbl>, MEOL <dbl>, MEXX5 <dbl>, MINU6 <dbl>,
## # MOCI <dbl>, NELU2 <dbl>, OCBA <dbl>, OESP2 <dbl>, OEXXXX1 <dbl>,
## # OPEN3 <dbl>, ORVU <dbl>, OXDI2 <dbl>, PAAC3 <dbl>, PECR2 <dbl>,
## # PHNO2 <dbl>, PHPH2 <dbl>, PHPI <dbl>, PHPO3 <dbl>, PHRO3 <dbl>, ā¦
head(Sites)
## # A tibble: 6 x 13
## Site Management `Veg Sample Dat⦠`% Vegetation` `% Litter` `% Bare`
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Bast⦠Cultivated 5/22/13 0.377 0.531 0.0583
## 2 Bast⦠Cultivated 6/14/13 0.381 0.434 0.184
## 3 Bast⦠Cultivated 7/2/13 0.319 0.471 0.210
## 4 Brac⦠Grassland 5/27/13 0.583 0.218 0.0781
## 5 Brac⦠Grassland 6/18/13 0.629 0.202 0.0938
## 6 Brac⦠Grassland 7/5/13 0.543 0.261 0.0948
## # ⦠with 7 more variables: `% Rocky/Other` <dbl>, `% Canopy Cover` <dbl>,
## # `Heteogeniety (# Patches)` <dbl>, `% Developed` <dbl>, `% Crop` <dbl>, `%
## # Semi-Natural` <dbl>, `% Other` <dbl>
#creating a category to define the species combining the genus and species name
Bees$full_species_name<- paste(Bees$insect_genus, Bees$insect_species, sep = "_")
Bees<- Bees%>% filter(nest_location != "NA") #takes out ant NAs
#averaging the %vegetation/bare ground/litter at each site across the three time points
#Sites2<- Sites %>% group_by(Site) %>% summarise(percent_veg = mean(`% Vegetation`), percent_litter = mean(`% Litter`), percent_bare = mean(`% Bare`), percent_rocky = mean(`% Rocky/Other`), percent_canopy = mean(`% Canopy Cover`))
#filtering out the rows of sites with NAs
#Sites3<- filter(Sites, `Heteogeniety (# Patches)`>0)
#combining the two data frames
#Sites4<- merge(Sites3, Sites2, by = "Site")
#ended up not having to use the code that I commented out above
Bees$count<- 1
#dividing the sites up by the sampling time
Bees$sample_period<- substr(Bees$sample_start_date, 1,1)
Bees$Site_with_date<- paste(Bees$Site, Bees$sample_period, sep = "_")
#dividing the sites up by sampling time again for the sites data frame
Sites_full$sample_period<- substr(Sites$`Veg Sample Date`, 1,1)
Sites_full$Site_with_date<- paste(Sites_full$Site, Sites_full$sample_period, sep = "_")
#doing the same thing for the plant data
Plants$sample_period<- substr(Plants$`Floral Sample Date`, 1,1)
Plants$Site_with_date<- paste(Plants$Site, Plants$sample_period, sep = "_")
Plants$site_abundance<- rowSums(Plants[,3:231])
#putting the management type in the bee dataframe so that I could filter by type
Site_management<- select(Sites, Site, Management)
Bees<- merge(Bees, Site_management, by = "Site") #I don't know why when I merged these two data frames it tripled the amount of data... it was working normally before I knitted and then something went wrong
#Calculating mean abundances at each site
abundance<- Bees%>%group_by(Site_with_date)%>%summarise(sum(count))
mean_abundance<- mean(abundance$`sum(count)`)/3
se<- sd(abundance$`sum(count)`)/sqrt(120)/3
paste("The mean overall abundance (+/-SE) across all sites was", mean_abundance, "+/-", se)
## [1] "The mean overall abundance (+/-SE) across all sites was 105.391666666667 +/- 11.1546318514491"
#Calculating abundance at the agricultural sites
abundance_ag<- Bees%>%filter(Management == "Cultivated")%>%group_by(Site_with_date)%>%summarise(sum(count))
mean_ag_abundance<- mean(abundance_ag$`sum(count)`)/3
se_ag<- sd(abundance_ag$`sum(count)`)/sqrt(60)/3
paste("The mean abundance (+/-SE) at agricultural sites was", mean_ag_abundance, "+/-", se_ag)
## [1] "The mean abundance (+/-SE) at agricultural sites was 125.033333333333 +/- 19.1761676279809"
#Calculating abundance at the grassland sites
abundance_grass<- Bees%>%filter(Management == "Grassland")%>%group_by(Site_with_date)%>%summarise(sum(count))
mean_grass_abundance<- mean(abundance_grass$`sum(count)`)/3
se_grass<- sd(abundance_grass$`sum(count)`)/sqrt(60)/3 #These means were working before I knitted, but then it got tripled somehow so I had to divide by 3 to get the results I had before
paste("The mean abundance (+/-SE) at grassland sites was", mean_grass_abundance, "+/-", se_grass)
## [1] "The mean abundance (+/-SE) at grassland sites was 85.75 +/- 11.0053562973508"
#From the paper:
knitr::include_graphics("Screenshot 2020-04-16 17.37.51.png")

Replications/Reanalysis
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-6
library(reshape2)
#Inferential statistic - recreating the linear mixed model
site_species_matrix<- read_csv("Species_tbl.csv") #had to create a pivot table in excel to get the data in the right format to calculate the chao richness
## Parsed with column specification:
## cols(
## .default = col_double(),
## Site = col_character()
## )
## See spec(...) for full column specifications.
#calculating the chao species richness for pollinators and plants
chao<- estimateR(site_species_matrix[,-1]) #takes out the first column because you can't run the site in this function
chao<- as.data.frame(t(chao)) #transposes the list and makes in a data frame
chao%>% summarise(mean= mean(S.chao1)) #gets the mean richness
## mean
## 1 26.9845
chao<- cbind(chao, Sites_full$Site_with_date) #puts the site name in the dataframe
colnames(chao)<- c("S.obs.poll", "S.chao1.poll", "se.chao1.poll", "s.ACE.poll", "se.ACE.poll", "Site_with_date")
chao_plants<- estimateR(Plants[,3:231]) #calculates chao diversity
chao_plants<- as.data.frame(t(chao_plants)) #transposes the dataframe
chao_plants<- cbind(chao_plants, Plants$Site_with_date) #puts the site name in the dataframe
colnames(chao_plants)<- c("S.obs.plant", "S.chao1.plant", "se.chao1.plant", "s.ACE.plant", "se.ACE.plant", "Site_with_date") #changes the name of the column
#putting everything in one data frame so that I can run the model
full_dataset<- cbind(chao, chao_plants, Plants, Sites_full, abundance)
library(lme4)
## Loading required package: Matrix
#creating the model for abundance
abundance_model<- lmer(log(`sum(count)`+1) ~ Management + scale(site_abundance) + scale(S.chao1.plant) + scale(`% Vegetation`) + scale(`% Bare`) + scale(`% Canopy Cover`) + scale(`% Semi-Natural`)*Management + scale(`% Crop`) * Management + scale(`Heteogeniety (# Patches)`) * Management + (1|`Floral Sample Date`), full_dataset, na.action = na.fail)
summary(abundance_model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: log(`sum(count)` + 1) ~ Management + scale(site_abundance) +
## scale(S.chao1.plant) + scale(`% Vegetation`) + scale(`% Bare`) +
## scale(`% Canopy Cover`) + scale(`% Semi-Natural`) * Management +
## scale(`% Crop`) * Management + scale(`Heteogeniety (# Patches)`) *
## Management + (1 | `Floral Sample Date`)
## Data: full_dataset
##
## REML criterion at convergence: 303.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.11563 -0.53426 0.03391 0.57389 2.17020
##
## Random effects:
## Groups Name Variance Std.Dev.
## Floral Sample Date (Intercept) 0.1697 0.4119
## Residual 0.4939 0.7028
## Number of obs: 120, groups: Floral Sample Date, 37
##
## Fixed effects:
## Estimate Std. Error
## (Intercept) 5.32291 0.16489
## ManagementGrassland -0.16785 0.21494
## scale(site_abundance) -0.02444 0.08490
## scale(S.chao1.plant) 0.23223 0.09646
## scale(`% Vegetation`) 0.13732 0.10969
## scale(`% Bare`) 0.37221 0.09227
## scale(`% Canopy Cover`) -0.03025 0.09545
## scale(`% Semi-Natural`) 0.12497 0.12495
## scale(`% Crop`) -0.01290 0.13961
## scale(`Heteogeniety (# Patches)`) -0.52603 0.35055
## ManagementGrassland:scale(`% Semi-Natural`) 0.16929 0.18663
## ManagementGrassland:scale(`% Crop`) 0.22213 0.18035
## ManagementGrassland:scale(`Heteogeniety (# Patches)`) 0.92151 0.35000
## t value
## (Intercept) 32.281
## ManagementGrassland -0.781
## scale(site_abundance) -0.288
## scale(S.chao1.plant) 2.408
## scale(`% Vegetation`) 1.252
## scale(`% Bare`) 4.034
## scale(`% Canopy Cover`) -0.317
## scale(`% Semi-Natural`) 1.000
## scale(`% Crop`) -0.092
## scale(`Heteogeniety (# Patches)`) -1.501
## ManagementGrassland:scale(`% Semi-Natural`) 0.907
## ManagementGrassland:scale(`% Crop`) 1.232
## ManagementGrassland:scale(`Heteogeniety (# Patches)`) 2.633
##
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
coefs.abundance <- data.frame(coef(summary(abundance_model))) #lmer does not give p-values in the summary so these next few lines of code extract those from the model
coefs.abundance$p.z <- 2 * (1 - pnorm(abs(coefs.abundance$t.value)))
coefs.abundance
## Estimate Std..Error
## (Intercept) 5.32290571 0.16489331
## ManagementGrassland -0.16785082 0.21494257
## scale(site_abundance) -0.02443660 0.08489636
## scale(S.chao1.plant) 0.23223005 0.09645811
## scale(`% Vegetation`) 0.13732329 0.10969062
## scale(`% Bare`) 0.37220528 0.09226838
## scale(`% Canopy Cover`) -0.03025480 0.09544592
## scale(`% Semi-Natural`) 0.12496997 0.12495404
## scale(`% Crop`) -0.01290211 0.13961444
## scale(`Heteogeniety (# Patches)`) -0.52603281 0.35055456
## ManagementGrassland:scale(`% Semi-Natural`) 0.16928641 0.18663233
## ManagementGrassland:scale(`% Crop`) 0.22212928 0.18034905
## ManagementGrassland:scale(`Heteogeniety (# Patches)`) 0.92151430 0.35000224
## t.value p.z
## (Intercept) 32.28090781 0.000000e+00
## ManagementGrassland -0.78091008 4.348554e-01
## scale(site_abundance) -0.28784042 7.734689e-01
## scale(S.chao1.plant) 2.40757416 1.605890e-02
## scale(`% Vegetation`) 1.25191468 2.106010e-01
## scale(`% Bare`) 4.03394180 5.484891e-05
## scale(`% Canopy Cover`) -0.31698376 7.512559e-01
## scale(`% Semi-Natural`) 1.00012756 3.172488e-01
## scale(`% Crop`) -0.09241247 9.263703e-01
## scale(`Heteogeniety (# Patches)`) -1.50057330 1.334660e-01
## ManagementGrassland:scale(`% Semi-Natural`) 0.90705830 3.643760e-01
## ManagementGrassland:scale(`% Crop`) 1.23166314 2.180749e-01
## ManagementGrassland:scale(`Heteogeniety (# Patches)`) 2.63288111 8.466398e-03
#creating the model for chao richness
chao_model<- lmer(log(S.chao1.poll +1) ~ Management + scale(site_abundance) + scale(S.chao1.plant) + scale(`% Vegetation`) + scale(`% Bare`) + scale(`% Canopy Cover`) + scale(`% Semi-Natural`)*Management + scale(`% Crop`) * Management + scale(`Heteogeniety (# Patches)`) * Management + (1|`Floral Sample Date`), full_dataset, na.action = na.fail)
summary(chao_model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: log(S.chao1.poll + 1) ~ Management + scale(site_abundance) +
## scale(S.chao1.plant) + scale(`% Vegetation`) + scale(`% Bare`) +
## scale(`% Canopy Cover`) + scale(`% Semi-Natural`) * Management +
## scale(`% Crop`) * Management + scale(`Heteogeniety (# Patches)`) *
## Management + (1 | `Floral Sample Date`)
## Data: full_dataset
##
## REML criterion at convergence: 228.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.60851 -0.42122 -0.03315 0.48959 2.41552
##
## Random effects:
## Groups Name Variance Std.Dev.
## Floral Sample Date (Intercept) 0.007561 0.08695
## Residual 0.297224 0.54518
## Number of obs: 120, groups: Floral Sample Date, 37
##
## Fixed effects:
## Estimate Std. Error
## (Intercept) 3.173068 0.104325
## ManagementGrassland -0.023934 0.144643
## scale(site_abundance) -0.004540 0.061309
## scale(S.chao1.plant) 0.006175 0.067289
## scale(`% Vegetation`) 0.186654 0.078327
## scale(`% Bare`) 0.206448 0.066594
## scale(`% Canopy Cover`) -0.029901 0.066615
## scale(`% Semi-Natural`) 0.156205 0.085684
## scale(`% Crop`) 0.116333 0.095557
## scale(`Heteogeniety (# Patches)`) -0.004189 0.236473
## ManagementGrassland:scale(`% Semi-Natural`) -0.099410 0.134697
## ManagementGrassland:scale(`% Crop`) 0.031252 0.131905
## ManagementGrassland:scale(`Heteogeniety (# Patches)`) 0.179703 0.242659
## t value
## (Intercept) 30.415
## ManagementGrassland -0.165
## scale(site_abundance) -0.074
## scale(S.chao1.plant) 0.092
## scale(`% Vegetation`) 2.383
## scale(`% Bare`) 3.100
## scale(`% Canopy Cover`) -0.449
## scale(`% Semi-Natural`) 1.823
## scale(`% Crop`) 1.217
## scale(`Heteogeniety (# Patches)`) -0.018
## ManagementGrassland:scale(`% Semi-Natural`) -0.738
## ManagementGrassland:scale(`% Crop`) 0.237
## ManagementGrassland:scale(`Heteogeniety (# Patches)`) 0.741
##
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
coefs.chao <- data.frame(coef(summary(chao_model)))
coefs.chao$p.z <- 2 * (1 - pnorm(abs(coefs.chao$t.value)))
coefs.chao
## Estimate Std..Error
## (Intercept) 3.173068235 0.10432512
## ManagementGrassland -0.023933885 0.14464285
## scale(site_abundance) -0.004540263 0.06130907
## scale(S.chao1.plant) 0.006174961 0.06728907
## scale(`% Vegetation`) 0.186653511 0.07832737
## scale(`% Bare`) 0.206447633 0.06659360
## scale(`% Canopy Cover`) -0.029901242 0.06661485
## scale(`% Semi-Natural`) 0.156204790 0.08568365
## scale(`% Crop`) 0.116333394 0.09555702
## scale(`Heteogeniety (# Patches)`) -0.004189055 0.23647281
## ManagementGrassland:scale(`% Semi-Natural`) -0.099410481 0.13469699
## ManagementGrassland:scale(`% Crop`) 0.031252088 0.13190516
## ManagementGrassland:scale(`Heteogeniety (# Patches)`) 0.179702669 0.24265928
## t.value p.z
## (Intercept) 30.41518903 0.000000000
## ManagementGrassland -0.16546885 0.868574967
## scale(site_abundance) -0.07405533 0.940966358
## scale(S.chao1.plant) 0.09176766 0.926882642
## scale(`% Vegetation`) 2.38299221 0.017172558
## scale(`% Bare`) 3.10011241 0.001934472
## scale(`% Canopy Cover`) -0.44886752 0.653527225
## scale(`% Semi-Natural`) 1.82304078 0.068297214
## scale(`% Crop`) 1.21742383 0.223443003
## scale(`Heteogeniety (# Patches)`) -0.01771474 0.985866418
## ManagementGrassland:scale(`% Semi-Natural`) -0.73803045 0.460495943
## ManagementGrassland:scale(`% Crop`) 0.23692848 0.812712279
## ManagementGrassland:scale(`Heteogeniety (# Patches)`) 0.74055551 0.458962990
knitr::include_graphics("Screenshot 2020-04-16 18.21.08.png")

#The summary and p-values are pretty close to their final model. I was not able to include nest site within city, as the paper did because they did not include that in their dataset that I pulled from online.
#to make figure 3 I need to separate the above and below ground nesting bees and the small and large bees
#grouping the bees
log_abundance<- Bees%>%group_by(Site_with_date)%>%summarise(log(sum(count)))
log_abundance_bare<- merge(log_abundance, Sites_full, by = "Site_with_date")
#plots the log abundance vs bare ground cover
abundance_bare <- ggplot(data = log_abundance_bare, aes(x = `% Bare`, y = `log(sum(count))`)) + geom_point(na.rm = TRUE, shape =1, size = 3) + geom_smooth(method = "lm", se = FALSE, na.rm = TRUE, color = "black") + labs(x="bare ground (%)", y ="log(bee abundance)") + theme(plot.title = element_text(hjust = 0.5, size = 12), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
abundance_bare

#gets richness with bare ground
chao$log_richness<- log(chao$S.chao1.poll)
log_richness_bare<- merge(chao, Sites_full, by = "Site_with_date")
richness_bare <- ggplot(data = log_richness_bare, aes(x = `% Bare`, y = log_richness)) + geom_point(na.rm = TRUE, shape =1, size = 3) + geom_smooth(method = "lm", se = FALSE, na.rm = TRUE, color = "black") + labs(x="bare ground (%)", y ="log(bee richness)") + theme(plot.title = element_text(hjust = 0.5, size = 12), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
richness_bare

#log of only above-ground bee abundance vs floral richness
log_abundance_above<- Bees%>%filter(nest_location == "above")%>%group_by(Site_with_date)%>%summarise(log(sum(count)))
log_abundance_above_floralrichness<- merge(log_abundance_above, chao_plants, by = "Site_with_date")
above_abundance <- ggplot(data = log_abundance_above_floralrichness, aes(x = S.chao1.plant, y = `log(sum(count))`)) + geom_point(na.rm = TRUE, shape =1, size = 3) + geom_smooth(method = "lm", se = FALSE, na.rm = TRUE, color = "black") + labs(x="floral richness", y ="log(above-ground \n abundance)") + theme(plot.title = element_text(hjust = 0.5, size = 12), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
above_abundance

#log of only large bee abundance vs Bare ground
log_abundance_large<- Bees%>%filter(size == "large")%>%group_by(Site_with_date)%>%summarise(log(sum(count)))
log_abundance_large_bare<- merge(log_abundance_large, Sites_full, by = "Site_with_date")
large_abundance <- ggplot(data = log_abundance_large_bare, aes(x = `% Bare`, y = `log(sum(count))`)) + geom_point(na.rm = TRUE, shape =1, size = 3) + geom_smooth(method = "lm", se = FALSE, na.rm = TRUE, color = "black") + labs(x="bare ground (%)", y ="log(large bee \n abundance)") + theme(plot.title = element_text(hjust = 0.5, size = 12), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
large_abundance

#log of only below groumd bees richness vs bare ground
below_bees<- read_csv("below_bees.csv") #had to make seperate pivot tables in excel for the below ground nesting bees because I couldn't figure out a way to do that in R
## Parsed with column specification:
## cols(
## .default = col_double(),
## Site_with_date = col_character()
## )
## See spec(...) for full column specifications.
chao.below.bees<- estimateR(below_bees[,-1])
chao.below.bees<- as.data.frame(t(chao.below.bees))
chao.below.bees<- cbind(chao.below.bees, below_bees$Site_with_date) #puts the site name in the dataframe
colnames(chao.below.bees)[6]<- "Site_with_date"
chao.below.bees$log_richness<- log(chao.below.bees$S.chao1)
log_richness_below<- merge(chao.below.bees, Sites_full, by = "Site_with_date")
below_richness <- ggplot(data = log_richness_below, aes(x = `% Bare`, y = log_richness)) + geom_point(na.rm = TRUE, shape =1, size = 3) + geom_smooth(method = "lm", se = FALSE, na.rm = TRUE, color = "black") + labs(x="bare ground (%)", y ="log(below-ground \n richness)") + theme(plot.title = element_text(hjust = 0.5, size = 12), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
below_richness

#log of only below groumd bees richness vs bare ground
small_bees<- read_csv("small_bees.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## Site_with_date = col_character()
## )
## See spec(...) for full column specifications.
chao.small.bees<- estimateR(small_bees[,-1])
chao.small.bees<- as.data.frame(t(chao.small.bees))
chao.small.bees<- cbind(chao.small.bees, small_bees$Site_with_date) #puts the site name in the dataframe
colnames(chao.small.bees)[6]<- "Site_with_date"
chao.small.bees$log_richness<- log(chao.small.bees$S.chao1)
log_richness_small<- merge(chao.small.bees, chao_plants, by = "Site_with_date")
small_richness <- ggplot(data = log_richness_small, aes(x = S.chao1.plant, y = log_richness)) + geom_point(na.rm = TRUE, shape =1, size = 3) + geom_smooth(method = "lm", se = FALSE, na.rm = TRUE, color = "black") + labs(x="floral richness", y ="log(small bee \n richness)") + theme(plot.title = element_text(hjust = 0.5, size = 12), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
small_richness

library(patchwork)
(abundance_bare|above_abundance|large_abundance)/(richness_bare|below_richness|small_richness)

#from the actual paper:
knitr::include_graphics("Screenshot 2020-04-16 17.37.08.png")

Summary/Discussion
I think the descriptive statistics at the beginning, although off by a couple of decimal points, were fairly accurately replicated. One problem I encountered when replicating this was that the data set that I downloaded offline did not have the nesting location (below or above) or size (large or small) in it, but I did find those listings in the supplemental information, but not in an excel sheet, so I had to copy it over to the main bees data sheet. While this was not hard to do, there were a couple of species missing in the supplemental information, so I had to write them in as NAs, which the authors might not have had to do, so that might be why the numbers in the mean abundances were slightly off.
When creating the linear models, the authors state that they used site nested within city as a random effect, but again the city was not included in the sites dataset as a variable, so I was not able to include this in the model. With that being said, I think that the esitmates from the model in the summary output are pretty close to the actual paper.
Lastly, with the visualization of firgure 3, I think I was able to recreate this pretty accurately. I did have to go into excel to create pivot tables to rerun the chao estimator on the populations this way. Whehn doing this, if the sites did not have any small or below ground nesting bees, for example, then some of the points were missing from the plot that I think they may have included as zeros.
References
knitr::include_graphics("Ballare_et_al-2019-Ecological_Applications.pdf")
Ballare, K. M., Neff, J. L., Ruppel, R., & Jha, S. (2019). Multiāscalar drivers of biodiversity: local management mediates wild bee community response to regional urbanization. Ecological applications, 29(3), e01869.